Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update h2 windowing algo & Http Client benchmark #388

Draft
wants to merge 43 commits into
base: main
Choose a base branch
from
Draft

Conversation

TingDaoK
Copy link
Contributor

@TingDaoK TingDaoK commented Aug 26, 2022

  • Initial build of our http client benchmark
  • Test using a local host, that using our http client to connect to the host and collect how many requests are made during a certain time.
  • To run:

ISSUE FOUND & FIXED

  • We update the window for connection for each data frame received, which really slows us down for frequent small chunk of data frames receiving.

    • We fixed it by only update the connection window whenever it drops to 50% of the max.
    • Same issue may happens to streams windows
    • Or the padding of the connection window.
  • Providing tiny increments to flow control in WINDOW_UPDATE frames can cause a sender to generate a large number of DATA frames. from here

    • Is the client's responsibility to make sure not doing this? Even if the user do manual window update and doing small window update?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@TingDaoK TingDaoK changed the title Canary Http Client benchmark Aug 26, 2022
@lgtm-com
Copy link

lgtm-com bot commented Sep 12, 2022

This pull request introduces 1 alert when merging 4cd7338 into f81ee94 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

@lgtm-com
Copy link

lgtm-com bot commented Sep 12, 2022

This pull request introduces 1 alert when merging f445de0 into f81ee94 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

@lgtm-com
Copy link

lgtm-com bot commented Sep 12, 2022

This pull request introduces 1 alert when merging f86a9f7 into f81ee94 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

@lgtm-com
Copy link

lgtm-com bot commented Sep 12, 2022

This pull request introduces 1 alert when merging ca0c7c5 into f81ee94 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

@TingDaoK TingDaoK marked this pull request as ready for review September 12, 2022 23:53
@@ -1762,6 +1767,8 @@ static void s_handler_installed(struct aws_channel_handler *handler, struct aws_
aws_linked_list_push_back(
&connection->thread_data.outgoing_frames_queue, &connection_window_update_frame->node);
connection->thread_data.window_size_self += initial_window_update_size;
/* For automatic window management, we only update connectio windows when it droped blow 50% of MAX. */
connection->thread_data.window_size_self_dropped_threshold = AWS_H2_WINDOW_UPDATE_MAX / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pull this magic number into a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's derive from a constant... and it's more clear about where it comes from/

@codecov-commenter
Copy link

codecov-commenter commented Apr 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.50%. Comparing base (ef78cb8) to head (46445c0).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #388      +/-   ##
==========================================
+ Coverage   79.40%   79.50%   +0.09%     
==========================================
  Files          27       27              
  Lines       11685    11697      +12     
==========================================
+ Hits         9279     9300      +21     
+ Misses       2406     2397       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -408,12 +408,6 @@ static int s_localhost_integ_h2_upload_stress(struct aws_allocator *allocator, v
s_tester.alloc = allocator;

size_t length = 2500000000UL;
#ifdef AWS_OS_LINUX
Copy link
Contributor Author

@TingDaoK TingDaoK Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, since it seems to be the flow control window issue.

  1. we sent out too much
  2. the initial window from the local server is small (default to the magic 65535). https://httpwg.org/specs/rfc7540.html#iana-settings
  3. The local server code also update the window as data received.

Don't know why it matters that much for linux comparing to the other platform. (I think we do found this windowing issue affect linux more as well from the canary before, which matches this)


list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake")

file(GLOB ELASTICURL_SRC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELASTICURL

target_link_libraries(${PROJECT_NAME} aws-c-http)

if (BUILD_SHARED_LIBS AND NOT WIN32)
message(INFO " elasticurl will be built with shared libs, but you may need to set LD_LIBRARY_PATH=${CMAKE_INSTALL_PREFIX}/lib to run the application")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elasticurl

CMakeLists.txt Outdated
Comment on lines 87 to 89
if (AWS_BUILD_CANARY)
add_subdirectory(bin/canary)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother having an AWS_BUILD_CANARY option? why not just always build it if we're building tests, like we're already doing with elasticurl

EXPORT ${PROJECT_NAME}-targets
COMPONENT Runtime
RUNTIME
DESTINATION bin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DESTINATION bin
DESTINATION ${CMAKE_INSTALL_BINDIR}

@@ -0,0 +1,28 @@
project(canary C)

list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary anymore

Suggested change
list(APPEND CMAKE_MODULE_PATH "${CMAKE_INSTALL_PREFIX}/lib/cmake")

raise RuntimeError("Return code {code} from: {cmd}".format(
code=process.returncode, cmd=args_str))
else:
print(output.decode("utf-8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.
we print the output even if everything went right?
In that case, just DON'T capture output at all. It will print to the console. you don't need to pass anything to suprocess.run() except args and timeout.

and remove the comment about "gather all stderr and stdout to a single string that we print only if things go wrong"

and remove like 80% of the code in this script because the whole reason it's so complicated was to suppress output if the test passed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at this PR a while and have no idea what "canary" does. Can you add a README.md with a very brief description? and very instructions instructions

@@ -51,6 +51,7 @@ def __init__(self):
def connection_made(self, transport: asyncio.Transport):
self.transport = transport
self.conn.initiate_connection()
self.conn.increment_flow_control_window(int(2147483647/2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment explaining why you're doing this, and why this magic number

Comment on lines 71 to +74
if isinstance(event, RequestReceived):
self.request_received(event.headers, event.stream_id)
self.conn.increment_flow_control_window(
int(2147483647/2), event.stream_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? I understand you need to increment the window after DataReceived (below). But why this change?

Comment on lines +76 to 79
self.conn.increment_flow_control_window(event.flow_controlled_length)
self.conn.increment_flow_control_window(
event.flow_controlled_length, event.stream_id)
self.receive_data(event.data, event.stream_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you move these calls from the def receive_data() function, out to here? Seems weird to make a function to handle everything related to this event ... and then move some of the code outside that function

@@ -98,6 +98,17 @@ struct aws_h2_connection {
* Reduce the space after receiving a flow-controlled frame. Increment after sending WINDOW_UPDATE for
* connection */
size_t window_size_self;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial / kinda-related: maybe change window_size_self and window_size_peer from size_t to uint32_t

they're not legally allowed to exceed 2^31-1. Having them be variably sized is just ... confusing

* received.
* When manual management for connection window is on, the dropped size equals to the size of all the padding in
* the data frame received */
uint32_t window_size_self_dropped;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial / naming: "dropped" doesn't make immediate sense to me. maybe:

  • pending_window_update_size and window_update_threshold?
  • window_increment_pending_size and window_increment_threshold_size?

@@ -384,6 +384,8 @@ static struct aws_h2_connection *s_connection_new(
connection->thread_data.window_size_peer = AWS_H2_INIT_WINDOW_SIZE;
connection->thread_data.window_size_self = AWS_H2_INIT_WINDOW_SIZE;

connection->thread_data.window_size_self_dropped_threshold = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have a non-zero value?

even if someone is doing "manual window management" on their stream, the HTTP client should still batch up its window update frames

CONNECTION_LOGF(TRACE, connection, "%" PRIu32 " Bytes of padding received.", total_padding_bytes);
}
connection->thread_data.window_size_self_dropped += auto_window_update;
if (connection->thread_data.window_size_self_dropped > connection->thread_data.window_size_self_dropped_threshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic holds back a WINDOW_UPDATE frame until its size would be > the threshold.

Should there be another threshold, where we also send it immediately if window_size_self gets too low?

also trivial, but maybe >= instead of >, since the threshold is probably going to be a nice round number

@@ -1762,6 +1767,8 @@ static void s_handler_installed(struct aws_channel_handler *handler, struct aws_
aws_linked_list_push_back(
&connection->thread_data.outgoing_frames_queue, &connection_window_update_frame->node);
connection->thread_data.window_size_self += initial_window_update_size;
/* For automatic window management, we only update connection windows when it droped blow 50% of MAX. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trvial

Suggested change
/* For automatic window management, we only update connection windows when it droped blow 50% of MAX. */
/* For automatic window management, we only update connection window when it drops below 50% of MAX. */

return aws_h2err_from_last_error();
}
connection->thread_data.window_size_self_dropped = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this logic into s_connection_send_update_window() ? I know it's only called from this one place, but it seems like if anywhere else ever wanted to call it, they should be minding all this math as well. I guess if we do that, we should rename it just s_connection_update_window()


if (auto_window_update != 0) {
if (s_connection_send_update_window(connection, auto_window_update)) {
if (total_padding_bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utterly trivial: put some whitespace between the if/else and the next if. Otherwise it looks like an if/else-if/else-if chain

Suggested change
if (total_padding_bytes) {
if (total_padding_bytes) {

if (auto_window_update != 0) {
if (s_connection_send_update_window(connection, auto_window_update)) {
if (total_padding_bytes) {
CONNECTION_LOGF(TRACE, connection, "%" PRIu32 " Bytes of padding received.", total_padding_bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just remove this LOG. The old statement was about how the connection window was being updated, but now it's just about how much padding was received, but the decoder is already logging about padding.

Comment on lines 753 to 754
}
if (with_data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: whitespace after logical if-block please (or if/else block, or if/elseif/elseif block)

Suggested change
}
if (with_data) {
}
if (with_data) {

Comment on lines 751 to 752
stream->thread_data.window_size_self_dropped_threshold =
connection->thread_data.settings_self[AWS_HTTP2_SETTINGS_INITIAL_WINDOW_SIZE] / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar feedback to connection window batching:

  • we should probably still batch up WINDOW_UPDATE frames, even if the user is doing it "manual", because most users will do 1 update for each read(), just like we were doing in our "automatic" behavior, which was giving performance problems
  • we probably want to send the updates anyway if the total window size is too small

maybe just hardcode these 2 magic numbers: the preferred batch size, and the window size at which we don't bother batching, and use the same values for both stream and connection behavior, whether its on automatic or manual mode

@TingDaoK TingDaoK changed the title Http Client benchmark Update windowing algo & Http Client benchmark Apr 7, 2025
@TingDaoK TingDaoK changed the title Update windowing algo & Http Client benchmark Update h2 windowing algo & Http Client benchmark Apr 7, 2025
@TingDaoK TingDaoK marked this pull request as draft April 8, 2025 22:58
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants